LISP.SPE[1,JMC] - www.SailDart.org

perm filename LISP.SPE[1,JMC] blob sn#198556 filedate 1976-01-27 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002		LISP starts with an aspira which was to make
C00048 ENDMK
C⊗;
	LISP starts with an aspira which was to make
a language that would be suitable for doing artificial intelligence
by which I meant at the time theorem proving, and also for symbolic
calculation, that is for doing calculations where data involve symbolic
information.  Now, I got the idea of using a. . . . . . . .structure
language from IPL, an information processing language that
was developed by Newell and Simon and I first heard about this from
them in 1956.  We had a research conference on Artificial Intelligence
at Dartmouth College where I was then during the summer of 1956
and they explained IPL, that stands for Information Processing
Language, and I don't know what number they were up to; it
eventually ended in IPL5, but I don't think they were anywhere near
up to 5 at that time.  However, there were some things that I didn't
really like about IPL; it's really a rather obscure language.
Its syntax happens to be based on the format of a loader
for the JOHNIAC computer, a long dead machine, and so
it's a rather peculiarly formatted language, moreover, it
does not allow applicative expressions, that is fg(x)),
and this seemed to me to be important for symbolic calculation.
Therefore I wanted the language to have that property in addition.
One of the things that occurred at this summer conference on
Artificial Intelligence was a proposal by Marvin Minsky for a
program to prove theorems in plane geometry by making use of
examples.  One of the people present at this conference, and one of
the organizer in fact, was Nat Rochester who is now at IBM
here at Technology Square.  At that time he was the head of the
Information Research Department of IBM and he decided that he would
undertake to realize this idea of Minsky's and he hired a new Ph.D.
by the name of Herbert Gelernter to do it, and he also hired me as a
consultant for I already was a consultant of IBM, to tell him how.
One of the things that was involved was a language for
manipulating symbolic expressions; that is, how to represent these
symbolic expressions, for example, the geometric assertion that two
triangles are congruent. 

I proposed that they use FORTRAN, which was newly in existence at the time,
but that FORTRAN be fortified by some functions which were added
for the purpose.  And I had really a great deal of difficulty
thinking of what these functions ought to be, that is for going
down the LISP structure, and finally I decided that one of the
important functions would be contents of the address part of register
number which would be abbreviated CAR and likewise the decrement part.
Now these strange entities, address and decrement, are related to
the word characteristics of the currently dead computer, namely
the IBM 704.  It had a 36 bit word which was divided into four
parts.  There was a three-part prefix, a fifteen
bit decrement and a three-bit tag and a fifteen bit address.
The important thing for LISP is not why IBM chose to divide the
word up that way but the fact that there were special instructions
in the machine for extracting these two significant parts of the
word, these two fifteen-bit parts of the word, so there you have CAR
and CDR.  At first I didn't let well enough alone, namely there were
also CPR and CTR; contents of the prefix part of register number
and contents of the tag part of register number.  These two
functions would give three-bit quantittes for CAR and CDR would give
fiifteen-bit quantities.  The virtue of these things was, as you
all know, that the address part of the register is the name for the first element
the LISP structure, then you can take CAR of CAR of something and
end up with this thing down here.  Now one thing that FORTRAN
did nicely is to allow you to combine functions by composition.
The other thing that was necessary was
the ability to make list structure, and there the ideal was taken
from IPL, that is, having a pre-storageness for all the registers
that were involved that were free and being able to put different
contents into the parts of the two registers on this list and
making the free register list shorter by that amount.

	This led to something called CONT and the original CONT had
four argumeets, or at least my intent was it would have four
arguments.  I don't remember at present what. . . . . . . . . and
friends did, but in any case there would have to be these two three-
bit parts and these two fifteen-bit parts. . . . . . . . . .it turns
out from many applications that one is content to leave these two
parts 0 because it was a little bit difficult to see what  . . . . . . 
these three-bit quantities and so whether an operation called CONT2
was made that had only two arguments and constructed these things.
Now, curiously enough while I saw that the applicative abiliiy of
FORTRAN was valuable for. . . . . . . down into an expression, the
corresponding fact with regard to CONTS didn't occur to me, so I
imagined CONTS to be a subroutine that would put these things in
this word and that somehow you would get them, front end of the pre-
storage list and would use it in a register called "free". . . . . . . . 
pre-storageness would refer to it.  You would then do the CONT
operation.

	Now, it was Gerlernter who noticed, or maybe Gerbrick, he
was the guy who was working for Gerlernter as programmer, I don't
know which, that if you returned as the value of the CONT's
subroutine, if you made it into a function of the value, and you
returned the pointer to the pre-storage list, then you also got
the ability to form complex expressions by composition of functions.
So that if you want to form plus of x and y, plus of x y to represent
a sum, then you could do this with CONTS of plus, CONTS of x CONTS
of y.  To that was their contribution.  They produced this language
or rather really simply wrote a collection of sub-routines for
FORTRAN and used it.  This language was called FLPL for FORTRAN LISP
Processing Language and they used it to program the geometry programs
and it had a certain further life, it lasted a while longer.  I
don't know when it finally became deceased.  I don't believe, for
example, that it was ever re-done for the IBM 360. Okay, this was
FLPL, and that was the state of things in production up until, I
guess, the summer of 1958, this was just after I moved to MIT and
which I spent working at IBM.  There was another thing that's
involved here, and that's conditional expressions.  Conditional
expressions did not arise in connection with LISP at all.  It arose
in connection, as far as I was concerned.. . . . . . I was involved
in writing a chess program the first year I was at MIT and a chess
program has some need for conditional expressions and so I
introduced a function which I called . . . ., three arguments,  a
Boolean argument and two others, this is of course going to be
PMALC.  Now it was introduced as a FORTRAN function and it had to
be that as such that in order to calculate it you had to calculate
all three quantities P, E, A and G.  So this was by . . . . . . . .
anything suitable for use recursively because it would always go
up another sentence.  And so I used it a little bit in the chess
program but always with this feeling that it was a luxury because
it allowed you to write things more compactly but you better be
sure that neither of these calculations was extensive or at least
the one that you didn't want shouldn't be extensive, otherwise
you would find yourself doing a lot of unnecessary calculations.....

	So the next step was to be able to write to use fishal
expressions with the CAR, CDR and CONT and the first LISP program,
that is I say this because the in a sense many features of the
language were devised for the purpose of writing this one program,
was a differentiation program where you're interested in forming
a partial derivative of an expression with respect to a variable
and so you get . . . . with respect to . . . . and let me see if I
can write it in the M-expression notation of that time. . . . Oh,
the notation for conditional expressions which I proposed then was
this:  PO-A,B . . . . . . Okay, so then this is the original
notation and I . . . . this in several papers.  This notation came
about in the following way.  I made a lot of propoganda for the
inclusion of conditional expressions in ALGOL , there was an ALGOL
committee in 1958 and another one in 1960 and in 1958 I lost, namely
the idea was too unfamiliar, and I didn't  explain it well enough,
so I did not succeed in getting this thing iinto ALGOL in 1958.
But I did in ALGOL in 1960 in getting them to agree with this.  But
they scoffed at one thing, and that was this notation.  Namely,
they said any notations as opposed to English words have to be
standard mathematical notations.  And that's clearly not a standard
mathematical notation so out with it.  And John Backus preferred
then this notation which corresponded to the conventions which ALGOL
. . . . . to the lack of a terminater here this is giving various
people who parce ALGOL fits, but ALGOL 68 liked to stick . . . . . .
out at the end here . . . . . .   Now, the curious thing about that
is that mathematicians have . . . . . used conditional expressions
much prefer this notation now.  So although I also have abondoned
my own notation, almost all mathematicians like Dana Scott and
so forth who use conditional expressions in various ways have used
this old notation.


. . . . . . equations written on board here, not audible. . . . 


	Mathematically, it's much simpler if you want to write a
set of axioms for conditional expressions to consider only the case
of three-term conditional expressions.  But this being originally a
programming thing rather than a mathematical thing, then the
t-part came first.


	Now, at this point after returning from RPL, the actual
Stanford Artificial Intelligence Project got started, excuse me,
the MIT AI Project got started, with Minsky and I running it and I
remember we told Reasoner that we wanted to start such a project
and when he said what do you need, we said we need a keypunch and
a secretary.  He said, fine, and how would you like some graduate
students?  Namely, the Research Laboratory of Electronics has
agreed,out of its empire money, to pay for six Mathematics graduate
studeets each year, and we have no idea of what to do with them.
So I think may be we ended up with all six and that sort of solved
Reasoner's administrative problems and got us nicely started.


	QUESTION:  Who were they?

	ANSWER:    Well, I can't remember who were in the first
batch.  Bobrow, certainly and Raphael, but Raphael certainly was
not in the first batch; Dave Luckham and Dave Park and two Englishmen
who didn't know their way around so they got stuck on a bus or
something and there was a guy by the name of Bob Brayton who
now works for IBM; Dave Park is in England and Dave Luckham is a
Research Associate at Stanford.


	So then we got started on this and my idea was to write
a compiler which was a little bit intimidating to me because I
hadn't ever written a compiler before and didn't know how they
worked.  The example we had was FORTRAN which had consumed
thirty man years and was still growing and thirty man years
seemed like a lot back in those days.  Nowadays thirty man years
doesn't seem like so much. . . . But the thing we decided to
do first was simply hand+code LISP and particularly for example
to hand-code this, that is, to develop some conventions whereby
LISP definitions could be translated into machine language or the
IBM 7090.  And the first point was to handle recursion because
this is the very basis of the thing which is required so we
decided to do it differently than in IPL.  IPL had all date
as list structures and they're push-down lists was LISP
structure, and I decided to simplify that in order to make the
thing run faster, and make the push-down list which were simply
linear sequences of registers and to reproduce one which was
called the public push-down list.  Now the idea of the adjective
'public' there was that there would also be the possibility of
'private' push-down list which belonged to various programs.  But
in fact that was never realized.  Public push-down lists remained
the only ones used in LISP.  And then subroutine entry conventions
were adopted from the standarddsubroutine entry conventions with
the same subroutine that allowed you to designate a block of
temporary register which you would put on the push-down list
for an unsaved routine.  So that first came the sub-program
with DO, when you enter it's the same value of these temporary
registers, the current value of the temporary registers on
push-down list file, you call to save . . . . and then just before
it left it would call the unsave . . . . . Well, the question then
arose what to do about erasing.  IPL had a rather complex system
for erasing unwanted list structure.  It worked in the file . . . . .
There was an explicit erase operator, so that you could erase list
structure.  But like LISP, I . . . . about proving LISP structure,
so that if the x is some variable . . . . . . . . . . . . . . . . . .

and so certainly erasing, that is taking all these words and
putting them back on the pre-storage list, is going to lead
to trouble.  Since if you erase x you would be erasing
something with y or . . . . .   So the way they handled this was
the notion of responsibility bits.  So is was noted somewhere
in here when you had . . . . . as to whether you were responsible
for this LISP structure or not.  And if you were responsible for
it, then when you were reased then it got erased.  If you were
not responsible for it, then when you were erased it didn't get
erased.  And it would be sometimes necessary of course to change
the responsibility for a sub-structure and that had to be part
of the program.  And a common kind of bug in an IPL program would
be called a space-thief which is where something would fail to
be erased, some LISP structure would fail to be erased.  The
variable with having a new value calculated for it and so forth
and this LISP structure would be left hanging and eventually the
program would stop because the machine had run out of re-storage.
So that was the kind of bug that IPL programs were headed for and
so Collins' idea which was sort of invented and . . . . . . . and
was considered the one Collins later used which is of having an out
feature; namely, of the number of times that a register is referred
to by other LISP structures, so when you erase a LISP structure 
under that circumstance, you have the pointer off somewhere, you
look at the count, and if the count is 0 then you just charge on
and erase; if the count is one or larger, then you reduce the count
by one and don't erase.  So you have the view that eventually when
some LISP structure is not referred to by anybody, then it will
get erased.  The . . . . . objection to that, or at least at that
time, seems to me to be . . . . .   . . . . .   word was 
inconvenienced for doing that;  namely, there were six bits left
over but they were in two different parts of the word and anyway
six bits didn't always look like it would be enough.  You might have
a very popular LISP structure. . . . . . so therefore I thought about
garbage-collecting . . . . have they had garbage-collecting?  Next
week.




.. . . . . all registers that contain LISP structure and you have
this pre-storagee list and you just use it, you never erase anything
and when you run out of free storage then you do the following:
you go down this list of useable LISP structure and you mark all
LISP structure that is in use, that is accessible from the program
because this is all the LISP structure that could ever be got at
by taking. . . . . and . . . . . variables in a program.  You mount
it by putting the bit in the word.  You also have to mark the
LISP structure which is accessible from the public push-down
list and this involves a little care because there are other
things on the public push-down list than pointers in the LISP
structure.  For example, there are return addresses in the program
and these are stored in the public push-down list.  And there would
be a certain temptation to put other things in the public
push-down list such as . . . . . point numbers.  But . . . . . numbers
would occupy a full word and therefore could have any value whatsoever
and so we gave up on putting them on the public push-down list and
on it that are pointers or that can be told from pointers by their
form.  So that was a . . . . on the further development of LISP
that was a consequence of this decision.  Everyone else who has done
a LISP system has had the same problem of not being able to have
. . . . .   You can either format the public push-down list so that
things have flags that tell, well, this block of registers is
something which is to be traced in either the marking phase, or
that block of registers does not, or else you have to be able to
tell the form of the things.  Okay, so garbage-collection was invented
and that has the great advantage over explicit erasure that it is
invisible.  Namely, the garbage-collector is a sub-routine of a CONTS
routine, that is, what the CONTS routine does when it finds it
doesn't have any free storage list.  So the user needn't even know
about that.  I could tell a little story about that point.  However,
we decided at first it seemed that the user ought to know about it
and the garbage-collector seemed like a very new and shiny device so
we had the garbage-collector tell what it was doing, it would print
out a page. . . . . 

. . . . . it didn't matter that there was an extra page in your
LISP program.  It didn't matter until we started using LISP on-
line and I could tell about the first time at that.  I was
interested in the possibilities of time-sharing and so we invented
a system for the IBM 704 which was called time-stealing and it
worked this way.  A teletype, actually a flexi-writer, was
attached to the IBM 704 and the operating system was gimicked
so that it was running these batch jobs one after the other, but
between jobs it would ask this teletype if it had anything for
it to process and if there was some stuff hanging around in the
buffer--there were interrupts on this thing--then it would do it.
So you could do this time-stealing working away at this flexi-
writer and you'd like to do this during what is called a fast
batch run where jobs were limited to one minute because you knew
you would be able to get in once a minute because one user could
do this and the system was much too cumbersome to be ever put into
use as a regular thing, but at least it was good for demonstrations.
And there was an Industrial Liason Office Symposium at which I even
took to demonstrate this and also time-sharing.  And the way the
thing worked is that I sat in the computer room at this console
and the audience was up-stairs on the fourth floor of building
26 at the conference room and there were closed circuit TV's
looking at this thing and while I was giving my LISP spiel backing
away and giving this example and the piece de resistance was supposed
to be determining whether a certain differeetial equation wassexact.
And I practiced a little bit, it seemed to work, but in any case I had
done a few more examples this time so I got started on this thing
and in the middle it suddenly started typing out the garbage--
collector at me and the . . . . . that there were 80 characters on
it and it knew that a line was 80 characters and if it had finished
on a line it would type spaces until it got to 80 characters.  So
the garbage-collector involves called space, space, space, space,
carriage return.  Some interesting statistics are as follows:
. . . . . the garbage-collector,  and furthermore, I was much more
overcome with laughter andI didn't know what my audience wassddng
upstairs.  But they were somewhat puzzled.  I don't know whether
there was time to finish that because it has the habit of typing
a whole page in a very leisurely form.  So, what was there?  Well,
at this time there was a read program and a print program so you
could read s-expressions off of cards, you could print s-expressions.
Then there were these conventions, there were the sub-routines
like CONTS, CAR and CDR and . . . . . . . . which were closed sub-
routines and then you could write a LISP program as a machine language
program for the 704 and then it would do it so a program that's
normal would have some READS and some . . . . with programs and
machine language and PRINT and there was this aspiration that
there should be a compiler that would compile this language.


	The next thing that happened was theoretical which was
that I wrote a paper on this thing and in the course of writing
this paper I claimed that LISP was as good for doing theory
computability as Turing machines, in fact, it's even better
than Turing machines because you could do all the essential
things faster; that is you could do less complicated proofs.
You could still prove it is better.  That all of the things
involved in proving that was that you could say the equivalent
of the universal Turing machine.  The universal Turing machine
is a Turing machine that when you print a description of another
machine on the Turing machine's tape, then it imitates that other
machine.  So the LISP equivalent of this would be a function that
when given a LISP function, that's a description of a LISP
function, would calculate the value of that LISP function.
So in order to do this it required first of all a notation
in which LISP expressions, LISP functions, were describable
as LISP data.  Up to this point there had been no such notation.
So at that point for the purpse of writing the paper the notation
that you all know and hate was invented which is to say, all right,
it has to come out . . . . . . . . . . . . . .   ...   . . . . .   .


some such notation had to be concoted for the purpose of writing
this paper to show that you could do things in LISP just as well
as you could with a Turing machine, that is, prove this theory,
this existence of a universal function.

	QUESTION;  Previously you'd written LIS functions in
FORTRAN or something?

	ANSWER;  No, in an informal  . . . .  expressions language
since there wasn't any compiler yet, no definite notation needed
to be . . . .    ...   . . . . . .    . . . . .. . . ...    .. . ..

Okay, so this EVAL was written and published in the paper and
Steve Russell said, look, why don't I program this EVAL and you
remember the interpreter and I said to him, ho, ho, you're
confusing theory with practice, this EVAL isn't intended to go
. . . . . . . . . . . . . . so he went ahead and did it.  That is,
he compiled the EVAL in my paper into 704 machine code fixing
bugs and then advertised this as a LISP interpreter which it
certainly was so at that point LISP had essentially the form that
it has today, the S-expression form.  The compiler project proceeded
rather slowly and there was this kind of major false start which
was the . . . . . . . compiler which I don't understand what was
wrong with it but in any case it never got debugged.  But then, I
forget who else, somebody else, maybe Dan Edwards, tried doing it
and produced the compiler and . . . . . the LISp assembly program.

	Now, there's one other kind of .. . . . . . describing that
connection which was that LISP functions that function as arguments
and that came in through the necessity of writing this differeetiation
example where you had sums and products of arbitrary numbers of terms
and in that connection the . . . . . . function was devised.

	Now, . . . . . . . was a graduate student, he was working
on his thesis innocently provoked a lot of trouble, he wanted to
write down a function that would test whether a given predicate
was satisfied by any part of an S-expression.  And he coded that in
a nice recursive way with functions of functions as arguments but
it didn't work.  So naturally he complained.  And I never understood
the complaint very well, namely, I said oh, there must be a bug
somewhere, fix it.  And Steve Russell and Dan Edwards said there's
a fundamental difficulty and I said, there can't be, fix it, and
eventually they fixed it and there was something they called the fun-
art device.  He tried to explain it to me but I was distracted or
something and didn't pay much attention so I didn'& really learn
what the fun-art thing did until really quite a few years later.
But then it turned out that these same kinds of difficulties
appeared in ALGOL and at the time, the LISP solution to them, the
one that Steve Russell and Dan Edwards had simply cooked up in
order to fix this bug, was a more comprehensive solution to the
problem than any which was at that time. . . 

	QUESTION; When was that about?

	ANSWER:  It was about 1960 probably.  Because I remember
that once I understood what the ALGOL was that we were talking
about then I was able already to brag that . . . . had solved that
problem.

	Okay, so that was LISP for the 7090. . . There was LISP1, then
there was an aspiration for LISP2 which would have all possible
virtues but it wasn't clear what were all possible virtues so LISP1.5
was devised which fixed up some minor things.  Let's see, what did it
fix up?  Well, LISP1 had the trouble that it read from cards at an
incredibly slow rate.  The card-reader went chunk, chunk, , , , and the
reason it went at such a slow rate was that the atoms, that is each
atom that appeared on a card, that is each string of letters
delimited by parentheses and spaces and so forth, had to be looked up
on a list of all atoms.  It had to search down a list of all atoms
and each of these you had to search looking for the . . . . . on the
list and that was what the card-reader . . . . . .  So a hash scheme
was developed for bucket-sorting the atoms and so . . . . . . cards
. . . . . . . . thereafter although I would say that reading is still
one of the ssowest parts of, in a simple LISP program, reading the
program takes more computer time than executing it.  However,. . . . 


	I ahould mention a couple of other things.  One is the style
of LISp programming involving writing these recursively defined
functions as conditional expressions, my original idea about that
was that there were some things that could be written that way but
that the general way of writing would be a FORTRAN-like way. . . . . .
Then, however, I got into the habit of using this way of writing
things and I hardly ever write any other way.  Although I don't
hold to the ideological view that it is . . . . . to write the other
way which appareetly some people do hold to.

	QUESTION:  Where did Lambda expressions come from?

	ANSWER:    I had read . . . . . at the beginning of Church's
paper and I made . . . . . . . . . and it seemed more elegant than
. . . . . . . . . . . . . . 

	I'll describe basically what this matter is.  That is, that
LISP doesn't do this correctly.  Namely, what I had though was that
if you write a recursive definition and regard this thing as a
functional. . . . ot in this case a functional equation for a function,
then this gave you the least defined function that satisfies the least
defined partial function that satisfies this functional equation.  And
that seemed totally obvious to me although I never felt the bind to
prove it and it wasn't until last year that a couple of friends,
graduate students at Staaford, discovered that it wasn't even true
ssand they had an example whereby it would not give you these fixed
points and what they showed was that LISP uses call by value and call
by naming would at least have this virtue to use and I think to
call by name. . . . . is no worse than a call by value EVAL as far
as computation time. . . . . but regrettably enough they have a
yet more fancy method of calling that does less calculations but is
a very hairy implement and would take to more calculation on simple
cases so it leaves you. . . . . 

	QUESTION:  Were Set To and. . . . . an afterthought or did
they get blended with the. . . . . ?

	ANSWER:    No, they weren't an afterthought.  They were kind
of more what I expected. . . . . tge fact that the purely. . . . . 
is as powerful as it is is a surprise rather than the other way
around.  I must admit that the way it is in there looks like it
is an afterthought.  It doesn't have quite the same amount of . . . 
. . . . .